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Abstract. We study a variation of the graph colouring problem on random graphs 
of finite average connectivity. Given the number of colours, we aim to maximise the 
number of different colours at neighbouring vertices (i.e. one edge distance) of any 
vertex. Two efficient algorithms, belief propagation and Walksat are adapted to carry 
out this task. We present experimental results based on two types of random graphs 
for different system sizes and identify the critical value of the connectivity for the 
algorithms to find a perfect solution. The problem and the suggested algorithms 
have practical relevance since various applications, such as distributed storage, can 
be mapped onto this problem. 



PACS numbers: 89.75.-k, 02.60.Pn, 75.10.Nr 
1. Introduction 

The graph colouring problem |2] has received a significant level of attention. Much 
of this interest stems from the fact many real-world optimisation problems can be 
represented as colouring problems. In the original formulation, given q colours, we 
aim at finding a colouring solution such that any two connected vertices have different 
colours. Here, the aim is to maximise the number of colours at one edge distance of any 
vertex. 

One application can be found in the field of logistics, where each vertex represents a 
storage unit. The problem is then to find how to distribute the different types of goods 
such that, at each site, any type can be retrieved either from the given unit or from 
directly adjacent storage units. The problem that got us interested in this problem 
is that of distributed data storage where files are divided to a number of segments, 
which are then distributed over the graph representing the network. Nodes requesting 
a particular file collect the required number of file segments from neighbouring nodes 
to retrieve the original information. Distributed storage is used in many real world 
applications such as OceanStore [3J. 

It should be emphasised that the main problem we are interested to solve has 
several properties that should be taken into consideration when one considers a colour 
assignment algorithm: 1) The problem is characterised by a medium number of different 
file segments. 2) An adaptive assignment of colours may be required as the (arbitrary) 
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topology continuously changes due to the emergence and disappearance of nodes. 3) 
The networks considered are of moderate size, 100-1000 nodes. All this points to an 
efficient and fast algorithm, that can handle colour assignment in large systems. 

Although this problem has not yet been shown to be NP-complete, it seems 
nonetheless intractable for a large system size. Since no research has been carried out 
on this specific problem, no dedicated tools exist either^. However, as we report in this 
paper, existing optimisation algorithms can be adapted quite easily to solve this and 
similar problems. In particular, we investigate two well established techniques: belief 
propagation (BP) and a variant Walksat (WSAT) for this purpose. 

In this paper, we show how message passing techniques (BP) can be used to solve 
this particular hard computational problem, and use the results to identify the transition 
point in terms of the connectivity above which the algorithms are able to find a perfect 
colouring for the graphs. For a given number of colours q, we identify the critical 
connectivity above which graphs are typically colourable by the algorithms, as well 
as the average minimum measure for the unsatisfaction E'^{X) as a function of the 
connectivity A. In a general setup, the measure of unsatisfaction is E'^{X) = X^iLi Ef{X) 
where for each vertex i with local connectivity Aj 



is the difference of the maximal number of available colours (from itself and its nearest 
neighbours, i.e. min(g, Aj + 1)), and the number of actually available colours at that 
node Qi. In this paper, we only consider graphs with local connectivities Xi> q — l, such 
that El{\) = q — Qi just counts the number of missing colours. One should note that, 
contrary to the original graph colouring problem, the problem of finding a colouring for 
our problem actually becomes easier with increasing connectivity. 

The main goal of this paper is to introduce the problem, and to investigate the 
behaviour of two algorithms on realistic system sizes. The analysis of the model in the 
sense of a phase diagram for infinite system size, is a separate issue that is currently 
being investigated. 

2. Belief propagation 

Belief propagation, a non-local algorithm also called the sum-product algorithm, 
relies on iterative message passing and provides near optimal performance at low 
computational cost in a wide range of applications Message passing techniques 
rely on conditional probabilistic messages passed from the immediate neighbourhood to 
find the most probable assignment of states to variables given the constraints. In our 
problem, the constraints correspond to the fact that at each vertex, one should be able 
to retrieve min(g, Aj + 1) colours from the vertex itself and its first order neighbours. 

These constraints can be represented by clusters of vertices on a graph as in Fig. 
where 'A', 'Bi', 'B2', 'Cn' and 'C12' correspond to the vertices, while 'Za\ 'Zb/, 'Zbs', 
X It should be mentioned that different systems of similar topology have been investigated in 0]. 




(A) = min(g, Aj + 1) - qi 



(1) 
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Figure 1. (a) A graph representing the successive local constraints, (b) A bi-partite 
graph representation of the problem. 

'Zcii' and 'Zc^j' are check variables corresponding to the constraints. The checks are 
variables related to the unsatisfaction of a given colour assignments for a node 'A' and 
its direct neighbours 

P{ZA\A,{Bi})=e-^^^-''^\ (2) 

where qa is the number of available colours to A and (3 is fixed to 20. 

The graph can be transformed into a bipartite graph, which separates the vertices 
from the checks. The following update rules can be easily obtained by naively adapting 
the original belief propagation rules jHl E] • Thus, the messages from a check to a vertex 
are given by 

P{Za\A)= J2PiZA\A,{B,})Pm}\{ZB.},{Zc..}) , 

{Bi} 

^ J2PiZA\A,{B.^)llP{B,\ZB,,{ZcJ) , (3) 

while the message from a vertex to a check is given by 

P{A\{Zs^}) = aA n PiZsM) , (4) 

Finally, the pseudo-posterior is given by 

P{A\Za, {ZsA) = aP{ZA\A) J] P{Zb.\A) , (5) 

where and a are normalisation coefficients. Note that the factorisation in Q is a 
relatively crude approximation even in the large system limit, as the {Bi} are correlated. 
To deal with this properly, a more advanced analysis using a cluster expansion j2j is 
currently been undertaken. Nevertheless, as we will see these approximations work 
remarkably well. 
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If convergence of the BP algorithm is reached, the colours of vertices whose (pseudo) 
posterior is greater than a pre-defined threshold set at 0.9 in our experiments can be 
fixed. If no such high posterior exists then the vertex with the highest posterior value 
has its colour fixed. Then, the update rules are re-iterated and the decimation process 
repeated until a global colouring is extracted. 

A major drawback of BP algorithm is that convergence is not guaranteed. Since 
all the probabilities are initialised randomly, different parts of the graph may converge 
to different local solutions. Then, parts with incompatible colourings will continue to 
compete with each other, resulting in non convergence. 

Time averaging is a way of getting around the problem by carrying out the 
decimation and colour fixing process according to the average posterior (over time i.e. a 
number of iterations) instead of instantaneous posterior. In the case of non-convergence, 
this method will decimate the vertex with the strongest average colouring probability 
over all competing solutions and thus reduce the fluctuations due to the competition. 
After several trials, a time window of 30 iterations was chosen for all numerical data 
presented here. This turned out to give the best balance between the quality of the 
obtained colourings and the computational cost. 

We have opted for BP combined with time averaging, as for the ultimate task of 
distributed storage we have in mind a scenario in which nodes may suddenly switch 
off and turn back on again (as is often the case in peer to peer networks). Then time 
averaging may have a significant benefit over other algorithms, being able to take the 
average probabihty that a node is able to provide a certain segment (or not), into 
account. This aspect however has not been included in the current paper. 

3. Walksat 

Walksat is a local search algorithm, originally designed to solve the problem of finding 
variable assignments that satisfy as many clauses as possible of a given conjunctive 
normal form M- Although Walksat local search may seem to be suboptimal at first 
sight, studies have shown it to be a powerful tool PUj. Many variants of the original 
algorithm exist [3 HH 112] • In this study, we have adapted the variant referred to as 
SKC, for solving this specific colouring problem. 

The original Walksat heuristic (SKC) uses the notion of the breakcount of a variable, 
which is the number of clauses that are currently satisfied, but would become unsatisfied 
if the variable assignment were to be changed. The SKC variable selection is as follows: 

(i) If there are variables with breakcount equal to 0, randomly select one such variable. 

(ii) Otherwise 

• with probability p randomly select a variable. 

• with probability 1 — p randomly select a variable with minimal breakcount. 

(iii) Flip the selected variable. 

(iv) Repeat until all clauses are satisfied or until the max-iterations is reached. 
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In our problem, the breakcount of a variable is given by the number of vertices for 
which the change of assignment would decrease qi. Henceforth, the breakcount is now 
dependent on the replacement colour. In the first step (i) of the SKC procedure, the 
selected replacement colour is the one which leads to a breakcount equal to (if more 
than one, choose one randomly). In the second step (ii), a replacement colour is selected 
at random. In our first few attempts, this adaptation of the Walksat algorithm showed 
mixed results, which were up to 50% worse than those obtained with the BP algorithm. 

Therefore, we adapted another local search algorithm ^3] related to Walksat. This 
other algorithm is also iterative and based on a mixture of gradient and "no^s^/' descent. 
At each iteration, one of these two descents is chosen at random, with some probability. 
Similarly to the Walksat algorithm, this step is repeated until all checks are satisfied, 
or until the maximal number of iterations is reached. 

The gradient descent is operated by the algorithm named GSAT jT3], which during 
an iteration changes the assignment of the variable that leads to the greatest decrease in 
total number of unsatisfied clauses. In our problem, the change will be made such that 
it leads to the greatest decrease in unsatisfaction as defined in (jT)). The ''noisy" move 
of the original algorithm ^Hj is replaced here by the Walksat SKC heuristic. Therefore, 
the resulting algorithm is a mixture between SKC and GSAT, which is parameterised 
by a probability Pm that set to 0.5. Our experiments shows that this mixture between 
SKC and GSAT performs significantly better than SKC alone with no increase of the 
computational cost. 

If not all checks are satisfied at the maximal number of iterations, the GSAT 
algorithm is iterated until a local minimal is reached. The choice of maximal number of 
iterations is discussed in the next section. The latter algorithm, referred to as Walksat, 
shows that results are qualitatively similar to the ones obtained using the BP algorithm 
both in terms of the measure of unsatisfaction E''{X) and of the number of perfect 
colouring found. 

4. Simulations 

The experiments are carried out for g = 4 colours and based on two system sizes (n): 
graphs of 100 and 1000 vertices. The studied graphs have an average connectivity A 
and are of two types, referred to as (cut-) Poissonian and linear graphs: 

• The vertices of the (cut-) Poissonian graphs with average local connectivity A have 
local connectivities Aj given by 



where ^A-g+i is randomly drawn from a Poisson distribution with parameter A— g-|-l. 

• The vertices of so-called linear graphs with average local connectivity A, have local 
connectivities Aj given by: 



Ai — Amin + ^A-A, 



q-l + Zx-g+1 



(6) 



(7) 
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where [AJ is the largest integer smaller or equal to A, and where -2a-[aj is 1 with 
probability A — [AJ and otherwise. 

We study the most interesting range of average connectivities from A = 3 to A = 5 with 
a step of 0.1. For each A, 1000 graphs of each type were randomly generated and then 
coloured by both the BP and Walksat algorithms. 

Graph characteristics 

The graphs and the constraints are born from the original problem we have set to solve, 
namely distributed storage. Here, we shall point out two observations that may help 
in getting insight into the characteristics of the problem and solutions found by the 
algorithms. 

The number of checks is always equal to the number of vertices. Indeed, each vertex 
is associated with a check, that connects it to all vertices at one edge distance. This 
check is obeyed when the vertex can retrieve all possible colours from vertices at one 
edge distance. 

The second observation is related to the fact that edges are not directed: if the 
vertex 'B' is connected to the check of 'A', then the vertex 'A' is also connected to the 
check of 'B'. Hence, there are always ^ short loops, which correspond to the number 
of edges, in the belief network even in the large system limit. When the connectivity 
value A increases, the number of loops increases as well, but it also becomes easier to 
get a lower value for the average unsatisfaction. Therefore, it is not clear whether the 
influence of the presence of loops on the performance of the (current) BP algorithm will 
increase or decrease with A. 

4-2. Walksat performance 

In the Walksat algorithm, the maximal number of iterations nbit is an important 
parameter. A greater value increases performance, but also computational cost. 
Unfortunately, the relation between performance and cost is not linear and it is therefore 
difficult to estimate the optimal number of iterations. In order to understand this 
relation, we carried out several simulations with different values of nbit for the two 
systems sizes and all connectivity values. 

Figure El and IHl show the results obtained for a system size of 100 vertices and a 
range of limits on the number of iterations. One notices that improvements in terms of 
unsatisfaction and perfect colouring is negligible for A < 3.8 for linear graphs and for 
A < 4.1 for Poissonian graphs. In these regions, no perfect solutions are actually found. 
Hence, the Walksat algorithm only stops when the maximum number of iterations 
is reached and returns the unsatisfaction of the nearest local minima. If the lowest 
unsatisfaction value over all examined colour assignments were returned instead, then 
one could expected improved results for larger values of nbit. 

For A > 3.8 and A > 4.1 for respectively linear and Poissonian graphs, when 
some perfect colouring solutions exist and are found, increasing nbit does actually 
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Figure 2. Walksat performance on linear graphs (n=100) for various nbit (from 
125K to 120M iterations) and connectivity values A. (a) The unsatisfaction measure, 
(b) Percentage of perfect colouring solutions. 
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Figure 3. Walksat performance on Poissonian graphs (n=100) for various nbit (from 
125K to 12M iterations) and connectivity values A. (b) Percentage of perfect colouring 
solutions. 

decreased the unsatisfaction value. However, a larger number of vertices will also 
require an exponentionally larger number of iterations to achieve the same performance. 
Ultimately, if nbit were infinite, Walksat would find perfect colouring solutions. 

Hence, to compare performance of the Walksat and BP algorithms, we take the 
results achieved by Walksat for roughly the same computational time as the one used by 
the BP algorithm. This means nbit = SOOA' and nbit = 12M iterations for systems sizes 
of 100 and 1000 vertices, respectively. We also amend the Walksat algorithm described 
in Sec. 0] such that the unsatisfaction returned will be the lowest unsatisfaction value 
over all examined colour assignments and not the one corresponding to the nearest local 
minima. 
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Figure 4. Comparison of belief propagation and Walksat algorithms for linear 
graphs, (a) Unsatisfaction measure. (b) Percentage of perfect colouring. Note that 
the percentage of perfect colouring solutions is for Walksat on graphs with 1000 
vertices. 




4-3. Comparison of the performance of the BP and Walksat algorithms 

Figure 0] and El show first that the BP algorithm is overall outperformed by the Walksat 
algorithm. We believe that the presence of the small loops discussed in Sec. 14.11 is 
the main cause that degrades the performance of the BP algorithm. Generalised belief 
propagation method |T3] is currently being investigated to obtain improved performance. 
However, it also shows that the approximative BP algorithm works surprisingly well 
considering the crude approximation made in (0). 

Then, it can be noticed that if Walksat clearly outperforms BP for 100 nodes 
systems both in terms of unsatisfaction and of perfect colouring, this is not the case 
when 1000 nodes systems are considered. Indeed, the results obtained by BP seem 
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better in terms of perfect colouring in that case for both hnear and Poissonian graphs. 
Obviously, if nbit were to be increased significantly, the Walksat algorithm would 
naturally outperform BP as for 100 nodes systems. However, for an incresing system 
size and given computing resources BP is likely to outperform Walksat. 

5. Discussion 

We have studied a variation of graph colouring on random graphs of finite average 
connectivity, aimed at maximising the number of colours accessible by a vertex within 
one edge distance. The methods has significant practical relevance, especially in the area 
of distributed storage that can be mapped onto this problem. Two efficient algorithms, 
belief propagation and Walksat have been adapted to carry out the task. 

We have presented experimental results based on two types of random graphs for 
different system sizes and identified the transition point, in terms of the connectivity, 
for both algorithms to find a perfect solution. For q = 4 colours, we have found that the 
critical connectivity is around A = 4 for linear graphs and around A = 4.4 for Poissonian 
graphs. In principle, the methods presented here can be used for random graphs of any 
connectivity profile and any number of colours. 

We have found that both algorithms give qualitatively very similar results, and 
that the overall computer time needed to generate all the data presented here was 
roughly the same for both algorithms. The relative efficiency of both algorithms, in 
terms of the quality of obtained solutions and computing time, does however depend 
on the combination of parameters (A, q, n) and graph characteristics. A more detailed 
analysis of this will be the subject of a separate study, as will be the thermodynamic 
phase diagram for this model. Further research will focus on improving the message 
passing approach by using the exact cluster expansion in the large system limit (i.e. 
focusing on stars and edges as our fundamental clusters instead of stars and nodes), 
combined with generalised BP, which will also provide us with a phase diagram for 
the model. It is assumed that this approach will remove the infiuence of short loops 
and therefore improve the performance of the algorithm, especially at low connectivity 
values. 

If replica symmetry turns out to be broken. Survey Propagation [16] like algorithm 
may further improve the results. Furthermore, we will consider to increase the distance 
to which a vertex can extend its search to retrieve the missing colours (e.g. second 
nearest neighbours), which will obviously change the basic clusters needed in the cluster 
expansion. 
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